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[57] ABSTRACT 

The present invention enables a visually impaired user to 
freely and easily control hyper text. A voice synthesis 
program orally reads hyper text on the Internet. In synchro- 
nization with this reading, the system focuses on a link 
keyword that is most closely related to the location where 
reading is currently being performed. When an instruction 
"jump to link destination" is input (by voice or with a key), 
the program control can jump to the link destination for the 
link keyword that is being focused on. Further, the reading 
of only a link keyword can be instructed. 
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HYPER TEXT CONTROL THROUGH VOICE * Change in voice gain 

SYNTHESIS The gain is changed at the point where a command is 

encountered. Set symbol G; ten levels of set value, 0 (small) 

CROSS REFERENCE TO RELATED to 9 (great) (normal gain is 9). 

APPLICATIONS s * Change in intonation 

Applicant claims the foreign priority benefits under 35 . ^ ^nation » 'hanged at the point where a command 

U.S.C. 119 of Japanese Application No. 199319, which was ^ enconntered^et symbol Y; ten levels of set value, 0 (no 

filed Jul. 29, 1996. This Japanese application and its trans- f * n * tlon > to 9 ( ma * mum intonation). 

lation are incorporated by reference into this application. ™ e v ° lce . , 

10 The voice is changed to a male voice at the point where 

FIELD OF THE INVENTION a command is encountered. Set symbol M; set value 1. 

* Female voice 

The present invention relates to a voice synthesis system, The voice is changed to a female voice at the point where 

or more specifically, to a method for detecting a word a command is encountered. Set symbol F; set value 1. 

included in a sentence of hyper text, and for synthesizing ^ Conventionally, a technique exists for synthesizing a data 

voices in accordance with a voice attribute related to that file containing such voice attribute information from a text 

word type; and a method for determining the type of word file including text attributes (style, font, underlining, etc.). 

included in a sentence to be read and for controlling voice [ n Japanese Unexamined Patent Publication No. Hei 

synthesis in accordance with that word type. 6-223070, for example, a method is disclosed for converting 

_ 1 Mr . 20 text attributes (style, font, underlining, etc.) of an input text 

BACKGROUND OF THE INVENTION file into ^ j^Ltto (speed, volume, etc.) by Ling a 

A conventional voice synthesis program (or a voice text-voice attribute conversion table, and for producing a 

synthesizer) reads an input text file having a voice attribute speed command containing an embedded command for the 

so described that its voice synthesis program can be pro- voice attributes. 

cessed. 25 In addition, in Japanese Unexamined Patent Publication 

For a voice synthesis program called "ProTALKER/2" No " H , ei 6 "» 4 2 47 . is disclosed a method for referring to a 

("ProTALKER" is a trademark of the IBM Corp.), a word contro1 signal-voice synthesis signal conversion table to 

called a "text embedded command/voice attribute" is convert a text control signal m an mput text file mto a voice 

embedded in text to control a voice attribute at the time of co ^ tr ? 1 sl S nal 1 >*" n 8 voice attributes, 

reading 30 Th ese techniques enable the reading of a text while 

A ' *• * LJJJ j- changes in the text attributes are reflected as voice attributes. 

Assume that the text in which an embedded command is ~ . . . , u . « 

u j j j • «xt i j- * * r*onm j- j • During reading, the text attribute changes, which are gen- 
embedded is: Normal reading first. [*S9] Reading speed is « j- i j * ♦ l t 

< . r*™i • * . * . J , ■. A . • . erally displayed as font changes or as colors on a screen, can 

increased here. [*P9] Voice pitch is changed to high. . ' r i . M °, , , . \ u 

r*o«n^T n j. < i r , . , , . be expressed as voice attribute changes (the changes m the 

3 1 K 8 , 8 ^ P r.cTv fV^ • l0 T Tf' « volume, pitch, intonation and speed) by a voice synthesis 

[*Y0] Robot reading. [*S=P=Y=] Readmg is returned to rQ ram A cx t rcadm ro ^ 

normal. [*F1] This is the phone number information. [*M1] JL ■ j ju u • n i_ j- j 

rj, n .. . ■ c f it" * l * t* i it There is a demand by users, such as visually handicapped 

Tell me the phone number or Mr. Kouicni Tanaka. , . ' . • , - r j. i I 

. f . persons who can not use the visual information displayed on 

Upon receipt of this text, a voice synthesis apparatus a dispJay screen ( and who> hereinafter are referred to as 

recognizes "[*" as the head of the embedded command for 40 v is Ua u y impaired users), that hypertext programs, such as 

instructing a voice attribute, and "]" as the termination of the We5 browsers> be prepared for their use. 

embedded command. Smce the above text does not desig- Conventional hypertext programs (viewers for on-line 

nate a voice command, it is read as a default. Then, the help ^ Web browsers) only display text data on screen and 

embedded command [*S9] is detected and the reading speed do not read ^ lext data 

is set to 9. Following this, upon the detection of [*P9], the 45 Although the HTML used on the WWW (World Wide 

voice pitch is set to 9, and upon the detection of [*S0P0], the Web) of me kernel can handle voice data, advance prepa- 

reading speed and the voice pitch are set to 0. Further, upon ration of mch voice data is necessarV) md sincc voice data 

the detection of [*Y0], the intonaUon is set to 0, and upon takes several forms such ^ AU> WAV> ra, etc., software and 

the detection of [*S-P-Y-], the reading speed, the voice hardware must be prepared for each form. Further, since 

pitch and the intonation are reset to normal. Sequentially, 50 mor e data is required for voice than for text, a longer transfer 

upon the detection of [*F1], text is read using a female ume ^ required for voice data At the present) howevcr) as 

voice, and upon the detection of [*M1], text is read using a voice data ^ not yet popular) most of the HTML data is 

male voice. provided as sentence data. But when the WWW data 

Changes for a plurality of attributes can be included in a becomes available orally, that will be convenient, 

single embedded command using the style format 55 Another demand is that not only the information currently 

[*<attribute symbol lxset value lxattribute symbol displayed on a screen be orally reproduced, but that a 

2><set value 2> . . .] visually impaired user who so desires can also easily and 

The contents of the embedded commands for instructing freely perform Web surfing while using the voice informa- 

voice attributes are as follows. tion that is provided by orally. 

* Change in speaking speed 60 In Japanese Unexamined Patent Publication No. Sho 
The speed is changed at the point where a command is 63-231493 is disclosed a related method for additionally 

encountered. Set symbol S; ten levels of set value, 0 (slow) inputting headline code at the beginning of each headline for 

to 9 (fast) (normal speed is 5). input sentences, and for synthesizing only the contents of the 

* Change in voice pitch headlines for voice reproduction during a fast forward and a 
The pitch is changed at the point where a command is 65 fast reverse. 

encountered. Set symbol P; ten levels of set value, 0 (low) In Japanese Unexamined Patent Publication No. Hei 

to 9 (high) (normal pitch is 2). 3-236099 is disclosed a method whereby an analysis result 
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of a plurality of phrases is stored, and the analysis result is the block from the head, or the location of the keyword. The 

output in accordance with a control command that specifies "word list" is not necessary a table described in the preferred 

a reading position in a sentence and voice output, so that the embodiment of the present invention, and can take any form 

reading position can be indicated exactly. so long as the system can specify a link keyword and can 

It is therefore one object of the present invention to 5 identify the position of the link keyword, 

provide a system for identifying in text a word type that has The "voice synthesis input information" refers to a con- 

a specific feature, and for synthesizing while following the cept corresponding to an embedded command file in the 

control procedures relevant to the word type. preferred embodiment of the present invention. This infor- 

It is another object of the present invention to provide a rnation includes a file in which is embedded no voice 

system by which a visually impaired user is enabled to freely 10 attribute information so long as the command file has an 

and easily control hypertext. appropriate form for synthesizing. The "position in hyper 

text related to the position currently synthesized" is not 

SUMMARY OF THE INVENTION necessarily information indicating the exact position cur- 
rently synthesized, and may be information that is to a 

When the above described "text attribute/voice attribute degree erroneous. The "voice synthesis pointer information 

conversion" method is employed, a special word included in 15 related to the position in hyper text related to the position 

text can be orally read for identification. According to one currently synthesized" can be obtained not only from posi- 

aspect of the present invention, a system identifies a position tion information embedded in an embedded command, but 

where a voice synthesis program orally reads hyper text on also by measuring the amount of information, such as the 

the WWW of the Internet. Synchronized with the reading of number of words, for which voice synthesis (including an 

the sentence, a link keyword focuses on those data that are 20 intermediate process) has been performed, to acquire the 

most relevant to the location at which the reading is cur- voicc synthesis pointer. 

rently being performed. "Determining a related link keyword" refers to a concept 

In a period following the reading of a specific kevword P re ^ erre ^ embodiment of the present invention that is 

tU t , .-i > l j* e .F J. • related to a link keyword located immediately before the one 

Uaat continues until the reading of the next keywordbegins, for ^ synthesizing (if such a Unk keyword is not 

mefocusisonthespecifickeywordtha was read. When an ^ kQ ^rd). However, this can be 

^ CU r^T Pt0 ^^ Datl0n * in P ut ? unn S thls changed in the design stage to a link keyword located 

period, the link keyword can be designated, and the process ^^teiy & ct ^ c one for current voice synthesizing (if 

can jump to the link destination for the keyword. such a ^ keyword fa nQt presem% ^ ^ ^ keyword) . 

A word that has a link attribute (a link keyword) can be 30 The "user input instructing to jump to a link destination" is 

distinguished from another word by regarding it as a differ- mput performed not only by depressing a key on a keyboard 

ent voice attribute, or by inserting a sound (including a assigned in advance, by clicking a button icon with a 

voice) designating a link keyword. Thus, while listening to pointing device, or by selection in a pull down menu, but is 

sound without looking at the screen, only a simple manipu- ^ mput effected by a user's voice. "Accessing to a link 

lation is required to cause the reading process to jump to the 35 destination by using link destination information" can be 

Unk destination, and the reading of hyper text can continue. performed by sending to a data input/output controller a 

With this technique, a visually impaired user, such as a linking instruction in the form of a command to be trans- 

visually handicapped person, can easily use the Internet. mittcd by an HTML analyzer to the data input/output 

According to one aspect of the present invention, a controller, or by sending to an HTML analyzer an instruction 

method for controlling a hyper text including a plurality of 40 to access the link destination in the form of information that 

link keyword, wherein each of the link keyword is related to indicates a Unk keyword is designated by a user input 

a link destination information, comprises the steps of: (a) section. 

producing a word list managing information for specifying According to another aspect of the present invention, a 

the link keyword and position information for specifying a method for controlling an HTML file received by a Web 

position of me link keyword m the hyper text; (b) producing 45 server that includes a plurality of link keywords, each of 

voice synthesis input information by converting the hyper which is related to link destination information, comprises 

text; (c) synthesizing the voice synthesis input information; the steps of: (a) receiving the HTML file from the Web 

(d) obtaining a voice synthesis pointer information related to server; (b) producing a word list managing information for 

a position in the hyper text related to a position currently specifying the link keyword, position information in the 

synthesized; (e) determining a related link keyword by 50 HTML file for specifying a position of the link keyword and 

searching a position information in the hyper text related to the link destination information; (c) converting a start tag 

the voice synthesis pointer information in the word list; (f) and an end tag of the link keyword included in the HTML 

detecting user input instructing to jump to a link destination; file into voice attribute information, and correlating the 

and (g) accessing, in response to the user input, to a link voice attribute information as a voice attribute embedded 

destination by using link destination information related to 55 command, with the position information in the HTML file of 

the related link keyword. me link keyword, to produce a voice attribute embedded 

The "information for specifying a Unk keyword" may be command file; (d) synthesizing the voice attribute embedded 

any information employed for specifying a special word, command file; (e) obtaining, in response to a position 

such as information for a pointer 301 in FIG. 6 indicating the information related 10 the Unk keyword, a voice synthesis 

location of a word in hyper text, position information 303 in 60 pointer information related to a position in the HTML file 

FIG. 6 for a special word, or the name of a special word. related to a position currently synthesized; (e determining a 

Although the "hyper text" is preferably a single text object related Unk destination information by searching a position 

included in the hyper text, it may be a set of hyper text information in the HTML file related to the voice synthesis 

objects having a constant depth. The "position information pointer information in the word Ust; (g) detecting user input 

for specifying a position of a link keyword in hyper text" 65 instructing to jump to a Unk destination; and (h) accessing, 

may be information that enables the position of a link in response to the user input, to a link destination by using 

keyword to be identified, such as the location of the word or the related link destination information. 
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According to an additional aspect of the present nation information by searching a position information in 

invention, a method for synthesizing a sentence including a the HTML file related to the voice synthesis pointer infor- 

plurality of special words, comprises the steps of: (a) pro- mation in the word list; (g) an user input section for 

ducing a word list managing information for specifying the detecting user input instructing to jump to a link destination; 

plurality of special words and position information for 5 and (h) means for accessing, in response to the user input, to 

specifying a position of the plurality of special words in the a link destination by using the related link destination 

sentence; (b) producing voice synthesis input information by information. 

relating a voice attribute with each of the special words, and According to a still further aspect of the present invention, 

by converting the sentence; (c) synthesizing the voice syn- an apparatus for synthesizing a sentence including a phiral- 

thesis input information; (d) obtaining a voice synthesis 10 ity of special words, comprises: (a) means for producing a 

pointer information related to a position in the sentence word list managing information for specifying the plurality 

related to a position currently synthesized; (e) determining a 0 f special words and position information for specifying a 

special word related to a position currently synthesized by position of the plurality of special words in the sentence; (b) 

searching a position information in the sentence related to means for producing voice synthesis input information by 

the voice synthesis pointer information in the word list; (f) 15 relating a voice attribute with each of the special words, and 

detecting user input instructing to change the voice synthesis by converting the sentence; (c) means for synthesizing the 

position; (g) obtaining, in response to the user input, from voice synthesis input information; (d) means for obtaining a 

the word list a position information for special word adja- voice synthesis pointer information related to a position in 

cent to a specific word related to a position currently the sentence related to a position currently synthesized; (e) 

synthesized; and (h) synthesizing at a position related to the 20 means for determining a special word related to a position 

position information adjacent to the special word. currently synthesized by searching a position information in 

According to a further aspect of the present invention, an the sentence related to the voice synthesis pointer informa- 

apparatus for controlling a hyper text including a plurality of tion in the word list; (f) means for detecting user input 

link keywords, wherein each of the link keyword is related instructing to change the voice synthesis position; (g) means 

to a link destination information, comprises: (a) a word list 2 5 for obtaining, in response to the user input, from the word 

managing information for specifying the link keyword and list a position information for special word adjacent to a 

position information for specifying a position of the link specific word related to a position currently synthesized; and 

keyword in the hyper text; (b) means for producing voice (h) means for synthesizing at a position related to the 

synthesis input information by relating a voice attribute with position information adjacent to special word, 

the link keyword and by converting the hyper text; (c) means 30 According to yet another aspect of the present invention, 

for synthesizing the voice synthesis input information; (d) provided is a recording medium to store a program, that is 

means for obtaining a voice synthesis pointer information managed by a storage area a data processing system 

related to a position in the hyper text related to a position manages, for controlling a hyper text including a plurality of 

currently synthesized; (e) means for determining a related link keyword, wherein each of the link keyword is related to 

link keyword by searching a position information in the 35 a link destination information, with the program comprising: 

hyper text related to the voice synthesis pointer information (a) program code means for instructing the data processing 

in the word list; (f) means for detecting user input instructing system to produce a word list managing information for 

to jump to a link destination; and (g) means for accessing, specifying the link keyword and position information for 

in response to the user input, to a link destination by using specifying a position of the link keyword in the hyper text; 

link destination information related to the related link key- 40 (b) program code means for instructing the data processing 

word, system to produce voice synthesis input information by 

"Correlating a voice attribute with a link keyword" relates relating a voice attribute with the link keyword and by 

to a voice synthesis embedded command in the preferred converting the hyper text; (c) program code means for 

embodiment of the present invention, and is a concept that instructing the data processing system to synthesize the 

provides for the insertion of a word, which will be explained 45 voice synthesis input information; (d) program code means 

in the embodiment. for instructing the data processing system to obtain a voice 

According to still another aspect of the present invention, synthesis pointer information related to a position in the 

an apparatus for controlling an HTML file received by a Web hyper text related to a position currently synthesized; (e) 

server that includes a plurality of link keywords, each of program code means for instructing the data processing 

which is related to link destination information, comprises: 50 system to determine a related link keyword by searching a 

(a) a communication controller for receiving the HTML file position information in the hyper text related to the voice 

from the Web server; (b) means for producing a word list synthesis pointer information in the word list; (f) program 

managing information for specifying the link keyword, code means for instructing the data processing system to 

position information in the HTML file for specifying a detect user input instructing to jump to a link destination; 

position of the link keyword and the link destination in for- 55 and (g) program code means for instructing the data pro- 

mation; (c) means for converting a start tag and an end tag cessing system to access, in response to the user input, to a 

of the link keyword included in the HTML file into voice link destination by using link destination information related 

attribute information, and correlating the voice attribute to the specified link keyword. 

information as a voice attribute embedded command, with According to yet an additional aspect of the present 

the position information in the HTML file of the link 60 invention, provided is a recording medium to store a 

keyword, to produce a voice attribute embedded command program, that is managed by a storage area a data processing 

file; (d) a voice synthesizer for synthesizing the voice system manages, for controlling an HTML file received 

attribute embedded command file; (e) means for obtaining, from a Web server including a plurality of link keyword, 

in response to a position information related to the link wherein each of the link keyword is related to a link 

keyword, a voice synthesis pointer information related to a 65 destination information, with the program comprising: (a) 

position in the HTML file related to a position currently program code means for instructing the data processing 

synthesized; (f) means for determining a related link desti- system to receive the HTML file from the Web server; (b) 



04/01/2004, EAST Version: 1.4.1 



■ 5,983,184 

7 8 

program code means for instructing the data processing FIG. 4 is a diagram showing one example of an HTML 

system to produce a word list managing information for file that is converted according to the present invention, 

specifying the link keyword, portion information in the FIG. 5 is a diagram showing a user interface for a Web 

HTMLfde for specifymg a position of the lmkkeyword and browser of ^ t invent £ n 

the link destination iniormation; (c) program code means for 5 : 

instructing the data processing system to convert a start tag FIG * 6 1S a °i a g ram showing one example of a word list 

and an end tag of the link keyword included in the HTML of ^ Present invention. 

file into voice attribute information, and to correlate the FIG. 7 is a flowchart showing the processing of the 

voice attribute information as a voice attribute embedded present invention for producing a sentence that includes an 

command, with the position information in the HTML file of embedded command, 
the link keyword, to produce a voice attribute embedded 

command file; (d) program code means for instructing the DESCRIPTION OF THE PREFERRED 

data processing system to synthesize the voice attribute EMBODIMENTS 
embedded command file; (e) program code means for 

instructing the data processing system to obtain, in response Hardware Arrangement 

to a position information related to the link keyword, a voice 15 

synthesis pointer information related to a position in the A preferred embodiment of the present invention will now 

HTML file related to a position currently synthesized; (f) be described while referring to the accompanying drawings, 

program code means for instructing the data processing FIG. 1 is a schematic diagram illustrating the hardware 

system to determine a related link destination information by arrangement for a voice synthesis system of the present 

searching a position information in the HTML file related to 20 invention. A voice synthesis system 100 includes a central 

the voice synthesis pointer information in the word list; (g) processing unit (CPU) 1 and a memory 4. The CPU 1 and the 

program code means for instructing the data processing memory 4 communicate with a hard disk drive 13 as an 

system to detect user input instructing to jump to a link auxiliary storage device via a bus 2. A floppy disk drive (or 

destination; and (h) program code means for instructing the a driver for an MO or a CD-ROM) 20 communicates with 

data processing system to access, in response to the user ^ a bus 2 via a floppy disk controller 19. 

input, to a link destination by using the related link desti- A a , , . lJf ~ 

nation information. *J^T- ^ (c ?. 8 ^ ? m , M0 °? a 

According to yet one further aspect of the present f CDR °^ 15 ^ d J^l aT 

invention, provided is a recording medium to store a f or 'CD-ROM) ^. On the floppy diskand the 

program, that is managed by a storage area a data processing w hard ^ dnve 13 f d in a R0M 14 m stored code for a 

system manages, for synthesizing a sentence including a 30 computer program that sends commands to the CPU 1, etc., 

plurality of special words, with the program comprising: (a) while interacting with an operating system to cany out the 

program code means for instructing the data processing P reset invention. This code is executed by being loaded into 

system to produce a word list managing information for tnc memory 4. The code for the computer program may be 

specifying the plurality of special words and position infor- compressed, or may be divided into a plurality of code 

mation for specifying a position of the plurality of special 35 segments and stored in a plurality of storage media, 

words in the sentence; (b) program code means for instruct- The voice synthesis system 100 can be a used as a system 

ing the data processing system to produce voice synthesis that includes user interface hardware. The user interface 

input information by relating a voice attribute with each of hardware components are, for example, a pointing device 

the special words, ana* by converting .the sentence; (c) (mouse, joystick, etc.) 7 and a keyboard 6 used for input, and 

program code means for instructing the data processing 40 a display 12 used to provide visual data to a user. A printer 

system to synthesize the voice synthesis input information; ^ a modem can be conn ected, respectively, via a parallel 

(d) program code means for instructing the data processing port 16 ^ ^ a ^ n 15 ^ vdce ^ 

system to obtain a voice synthesis pointer information m can communicate ^ ^ther colter via the serial 

related to a position in the sentence related to a position 1 15 ^ ^ modenij Qr via a communication adaptor i 8 . 

currently synthesized; (e) program code means for instruct- 45 A . , t . . . . , x 

ing the data processing system to determine a special word vwce sl ^ n ^ 15 obtiu ° ed $ D/A 

related to a position currently synthesized by searching a conversion a an audio controller 21 is fransmittcd via an 

position information in the sentence related to the voice m P llfier 22 to a lo ^sP e ^ 23, through which the signal 

synthesis pointer information in the word list; (0 program ^ output as a voice Tie audro controUer 21 can also perform 

code means for instructing the data processing system to 50 AJD . (^log/digital) conversion of voice information 

detect user input instructing to change the voice synthesis received from a microphone 24, and can fetch external voice 

position; (g) program code means for instructing the data lnformatl0n mt0 s y ste °»- 

processing system to obtain, in response to the user input, ^ ^ described above, it can be easily understood that the 

from the word list a position information for special word present invention can be implemented by a normal personal 

adjacent to a specific word related to a position currently 55 computer (PC), a work station, or a combination of them. 

synthesiEed; and (h) program code means for instructing the ^ above described components are only examples, and not 

data processing system to synthesis at a position related to all the components arc required for the present invention, 

the position information adjacent to the special word. Especially, since the present invention is one for supporting 

_ a visually impaired user, the components such as a VGA 8, 

BRIEF DESCRIPTION OF THE DRAWINGS ^ a VRAM 9, a D AC/LCDC 10, a display device 11 and a CRT 

FIG. 1 is a block diagram illustrating a hardware arrange- 12, that are necessary for a user who is provided a visual 

ment. display are not required. Since instructions for the system 

FIG. 2 is a block diagram illustrating processing compo- can be given orally, as will be described later, the keyboard 

nents. 6, the mouse 7 and a keyboard/mouse controller 5 are also 

FIG. 3 is a diagram showing the procedures of the present 65 not required, 

invention for communication between a Web browser and a It is preferable that the operating system be Windows (a 

Web server. trademark of Microsoft Corp.), OS/2 (a trademark of IBM 
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Corp), or an X- WINDOW system (a trademark of MIT) on The voice synthesis controller 151 transmits a sentence 

AIX (a trademark of IBM Corp.), all of which support (embedded command file) including an embedded command 

standard GUI multi-window environment. However, the received from the Web browser 120 to the language analyzer 

present invention can be implemented in a character based 153. The language analyze 153 performs morphemic analy- 

environment, such as PC-DOS (a trademark of IBM Corp.) 5 sis of the received word by referring to the reading/accent 

or MS-DOS (a trademark of Microsoft Corp.), and is not dictionary 157 and the grammar stored in the grammar 

limited to a specific operating system environment. holding section 155, and divides the input sentence into 

In FIG. 1 is shown the system in a stand-alone environ- appropriate morphemes, 

ment. However, the present invention may be implemented The grammar holding section 155 stores the grammar 

as a client/server system wherein a client machine is con- 10 referred to by the language analyzer 153 for the morphemic 

nected by a LAN to a server machine via Ethernet or a token analysis. The reading/accent dictionary 157 stores "parts of 

ring; wherein on the client machine side are provided a user speech," "reading" and "accents" that relate to words includ- 

input section that will be described later, a synthesizer for ing Chinese characters and cursive kana characters, 

receiving voice data from the server machine and reproduc- readiog p rov i d e r 159 uses the reading information 

ing it, and a loudspeaker; and wherein on the server machine 15 stored in the reading/accent di ctionary 157 to determine how 

side the other functions are provided. The functions pro- to read me resp ective morphemes that are obtained by the 

vided on the server machine side and the client machine side languagc malyzcx 153 The acccnt proyidcr m uscs the 

can be changed as desired at the design stage. Various accent information stored in the reading/accent dictionary 

modifications for combinations of pluralities of machines 157 to determine the accents for the respective morphemes 

and for the distribution of the functions are also included in 20 ^ are obtained b ^ la ^ 153 

the present invention. r . iL . . . it 

In response to the reading determined by the reading 

System Configuration provider 159, and the accent determined by the accent 

The system configuration of the present invention will ' generator 163 generates a voice 

now be described while referring to the block diagram in 25 1"™?**$*?°? at TT? y ^ated parameters 

FIG. 2. In this embodiment, the system comprises a com- P ' T'^ • ' X 

_ • « 11 lift 11/ u L j • When a voice command indicating the voice attribute is 

mumcation controller 110, a Web browser 120 and a voice u A , A • 4 c , 6 . , iL 4 

, in ' t i«ft tu ' * • « j , embedded in front of the word currently synthesized, that 

synthesis unit 150. These components can be independently mUx ^ ^ d for ..J^ designated 

provided by the hardware arrangement m FIG. 1, or can be „, , , txl7U f . . J . i_ jj j 

• j j u u j . in parameters. When such a voice command is not embedded, 

provided by a common hardware component. 30 * a * u • H , 4U , . . . , . , ' 

r J F a default voice attribute that is set in the system in advance 

The communication controller 110 controls communica- & adopted for the « currcnt i v designated parameters." 

tions with another computer, such as a Web server. A ^ it _ . A .... 

detailed description of its functions will be described later. ™* V01CC 165 8 6nerates a Y° lce ,, sl 8 nal m 

_ _ . i . ,„„ . , , , , , accordance with a voice parameter generated by the param- 

The Web browser 120 includes a data mpuUoutput con- ctcr tor 143 ^ ^ fcrrcd embodimcnt of thc 

trolkr 121 an HTML file storage section 123, an HTML t inventi ^ ^ of ^ voice si ^ [& 

analyzer 125 a user input section 127, a focus controller rcalized b the audio controllcr 21 in FIG. 1 performing the 

} m t™*}* 1 }* 1 ' a ^lay section 133, a conversion D/A (digital analog) conversion. The voice generator 167 

table 135, and a focus pointer 139. generates a yoice m response tQ a yoice sigQa] generated by 

The input/output controller 121 accesses a Web server 60 40 the voice synthesizer 145. In the preferred embodiment of 

based on information for specifying a URL, and instructs the the present invention, the voice is released through the 

communication controller 110 to receive a HTML (Hyper amplifier 22 and the loudspeaker 23 in FIG. 1, 

Text Markup language) file from the Web server 60. ^ ^ m p[G 2 haye been 

The HTML file storage section 123 stores an HTML file described, they are theoretical functional blocks. They are 

that is received by the communication controller 110 and the 45 not ai ways individually implemented by hardware or 

data mput/output controller 121, and an HTML related file software, and can be provided by combined or common 

such as an image file. The HTML analyzer 125 analyses the hardware or software. 
HTML file, determines whether a file to be received is still 

present, and produces the word list 131 and an embedded Data Flow 

command file 141. 50 

Hie focus controller 129 receives position information t M ex P la ° ation ™& now be for data exchange 

(voice file information 171) reading is currently performed ^imai * e funcUonal blocks that were described ^er the 

by the voice synthesis unit 150 and information for the word sub-headmg < System Configuration." 

list 131, and specifies a word that should be currently d™-#-,« nf un^r r> i c-i 

focused on. The display section 133 displays the contents of 55 ReCep,1 ° D ° f "™ L RelaUDg FJe 

a HTML file and the word that is currently focused on. The The communication control 110 controls communications 

conversion table 135 is used to convert a keyword in an with the Web server 60, as is shown in FIG. 3. In FIG. 3, 

HTML file into an embedded command for instructing a first, information specifying a URL input at the user input 

voice attribute for the reading. section 127 is received via the input/output controller 121. 

The voice synthesis unit 150 is constituted by a voice 60 Based on this information, the Web server 60 is accessed, 

synthesis controller 151, a language analyzer 153, a gram- and an HTML (Hyper Text Markup Language) file is 

mar holding section 155, a reading/accent dictionary 157, a received from the Web server 60. 

reading provider 159, an accent provider 161, a parameter The HTML file received by the communication controller 

generator 163, a voice synthesizer 165, a voice generator 110 is stored in the HTML related file storage section 123. 

167, a voice synthesis pointer storage section 169, a voice 65 The HTML file is analyzed by the HTML analyzer 125. The 

file storage section 171, and a voice synthesis jump pointer HTML analyzer 125 analyzes the HTML file, and deter- 

storage section 173. mines whether or not a file to be received, such as an image 
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file, is still present. When a file to be received is present, the 
file name is specified and requested of the data input/output 
controller 121. The data input/output controller 121 again 
accesses the Web server 60 via the communication controller 
110, and receives an HTML relating file from the Web server 
60. The received HTML relating file is stored in the HTML 
related file storage section 123. 

Producing The Word List 



10 



20 



25 



30 



In FIG. 4 is shown a sample of an HTML file used in the 
preferred embodiment of the present invention. In this 
embodiment, the HTML file is input to produce a sentence 
including an embedded command. As is shown in FIG. 4, the 
HTML file in a text form includes tags, <TITLE>, <H1>, 
<H2>, <H3>, <H4>, <H5> and <H6>. 

The procedures for producing an embedded command file 
and a word list will be explained while referring to a sample 
of an HTML file. FIG. 4 is a diagram showing the contents 
of a sample HTML file in this embodiment. The HTML file 
in FIG. 4 is processed by the display section 133 and is 
shown to a user as a graphical image, as is shown in FIG. 5. 
The HTML tag begins with the start tag <XXXX> and ends 
with the end tab </XXXX>. Thus, the system can recognize 
the types of respective tags and can extract them. 

Apointer 301 is a number allocated for a valid tag. In this 
embodiment, when there is an overlapping tag, such as 
"<html><nTLE>/' only the last tag is valid and the other, 
preceding tags are ignored. For example, when 
"<Hl>picture of Yamato</HlxIMG SRC«" 
yamato.gif '><H2>H2<l>Italics</I> This is" input, the tag, 
"</Hl><IMG SRC ="y amato.gif '>", is ignored. However, 
when the tag for a link keyword and the other tag overlap as 
in "<A HREF- ( 'THAT.HTM"><l>link keyword<fl> This is 
also </A>," the tab of the link keyword is not ignored, and 35 
a list without a word is formed. 

Position information 303 indicates the start position for a 
word that relates to a valid tag. In a case where "<A 
HREF="THAT.HTM"><l>link keyword</I> This is also 
</A>", "<A HREF="THAT.HTM"> M is detected and a link 
keyword flag indicating a word that relates to a link keyword 
is set to 1, and a link keyword head flag indicating a word 
that is the head of the link keyword is also set to 1, Further, 
in response to the word that is the head of the link keyword, 
link destination information is set. In this embodiment, the 
word list is generated by the HTML analyzer 125. 

Although information in FIG. 6 is managed in the word 
list of this embodiment, not all of the information is required 
for the present invention. The word list 131 is for managing 
position information where a word related to a link keyword 
exists. When the position information where a word related 
to a link keyword exists is managed, the present invention 
can be operated. 

Producing The Embedded Command File 55 

The procedures for producing an embedded command file 
will now be explained. An embedded command in this 
preferred embodiment is produced by the HTML analyzer 
125 by using following two procedures. 60 

Producing The Tentative File 

The HTML file shown in FIG. 4 is temporarily converted 
into a form shown in Table 1. Unnecessary information, such 
as "<html>" or "<IMG SRC-"yamato.gif">/' is removed 65 
from the HTML file, and the invalid start tab is converted 
into voice attribute information based on the text attribute/ 



voice attribute conversion table (conversion table 143). The 
end tag is converted into an embedded command to return 
the voice attribute, which has been changed by the related 
start tag, to a default value. 

Although in the preferred embodiment of the present 
invention the conversion is performed by the conversion 
table 143, it can be performed by using the internal logic for 
a conversion program, instead of using the conversion table. 

Table 1 shows one example of the text attribute/voice 
attribute conversion table. 

TABLE 1 



40 



45 



50 



(text attribute) 
default 

TITLE 



HI (headline 1) 

H2 (headline 2) 

H3 (headline 3) 

H4 (headline 4) 

H5 (headline 5) 

H6 (headline 6) 

I (Italics) 
B (Bold) 
A HREF 
"(link keyword) 



(voice attribute) 
S5P2G8Y5 



S5F3G8Y5 

S5P4G8Y5 

S5P5G8Y5 

S5P6G8Y5 

S5P7G8Y5 

S5P8G8Y5 

S3 
G9 
SI 



(speed 5, pitch 2, 
intonation 5) 
SSP1G8Y6 
(speed 5, pitch 1, 
intonation 6) 
(speed 5, pitch 3, 
intonation 5) 
(speed 5, pitch 4, 
intonation 5) 
(speed 5, pitch 5, 
intonation 5) 
(speed 5, pitch 6, 
intonation 5) 
(speed 5, pitch 7, 
intonation 5) 
(speed 5, pitch 8, 
intonation 5) 
(speed 3) 
(volume 9) 
(speed 1) 



volume 8, 

volume 8, 
volume 8, 
volume 8, 
volume 8, 
volume 8, 
volume 8, 
volume 8, 



This table may be fixed or may be user alterable. When the 
same text attribute appears continuously, it may happen that 
the same voice attribute will be assigned and a user will not 
be able to identify it (the sentence can not be divided). When 
the same text attribute appears sequentially, therefore, dif- 
ferent voice attributes can be alternately assigned, or oral 
reading can be performed at a constant interval to indicate 
separate parts of the sentence, or a plurality of voices can be 
inserted. 

Table 2 is one example of a tentative file. The tentative file 
can be produced by a flowchart in FIG. 7. In this file, only 
a voice attribute for which the default voice attribute is 
changed is inserted as an embedded command. However, a 
complete voice command shown in Table 1, such as 
"[*S5PlG8Y6]a tiile[*S5PlG8Y6], tt can be regarded as an 
embedded command. 
[Table 2] 

[*PlY6]a title[*P2Y5] 

[*P3]the picture of Yamato[*P2] 

[*P4]H2[*Y8]Italics[*Y5]this is[*P2] 

[*P6]H4[*Y8]Itahcs[*Y5]this is[*P2] 

[*P8]H6[*Y8]ItaHcs[ + Y5]this is[*P2] 

[*SlJThis is a link keyword[*S5] 

This is not a link keyword 

[*SlJThis is [*Y8]a link keyword[*Y5]too[*S5] 

[*P7] 

[*S1]HTTP[*S5] is, as is indicated by its name, 
[*Sl]an HTML[*S5] transfer protocol, 
[*S1] that WWW [*S5] uses. 
[•P2] 

Final File 

The embedded commands where the tentative files continue 
are arranged. In this case, when voice attributes of the same 
type exist, the last voice attribute is regarded as valid. The 
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serial numbeis, "Dl," "D2," . . . , are inserted as embedded \foice Synthesis 

position (order) information in the respective embedded ,„ , «, c . .. ... 

commands. The embedded position information corresponds . ^« a f 6 "f^ . , "l^*"** 

to the pointer in the word list 131 in FIG. 6. Finally the t£ZZ%L ^ k " lhe 1 gr ? m T 

embedded command is transmitted to the voice syndesis S * eCta °* '° 1**°™ ™rphermc °* » 

unit 150 in the form shown in Table 3. ^ ^ ^ VWCe Synthe f C0D,r ° ller lS \\ ? d ™ 

TTable 31 input sentence to obtain appropriate morphemes. Although 

[*PlY6DllThis is a tide division mav be performed by a unit in which a 

[*P3Y5D2]the picture of Yamato command is embedded, in this embodiment, a word that 

r*p4rniH9 ignores an embedded command is used to perform morphe- 

*Y8D4]Itahcs 10 ^analysis. 

[*Y5D5]This is Therefore, when word "[*SlP7D17]HTIP[*S5D18]is as 

[*P6D6]H4 k m ^icated by its name," is input, voice synthesis is per- 

[*Y8D7]Italics formed for the units "HTTP is/as is/indicated by/its name/' 

[*Y5D8jThis is 15 instead of "HTTP/is/as is/indicated by/its name." 

[*P8D9]H6 In me above case, the form of the data transmission to the 

[* Y8D 1 0]ltalics P arameter generator 1 63 can be changed at the design stage, 

[*Y5Dll]This is for example: "[*SlP7D17]HTIP/[*S5D18]is/as is/indicated 

[*SlP2D12]This is a link keyword b y'i ts name" or "[*S1P7D17]HTTP is/[*S5D18]as 

[*S5D13]This is not a link keyword 20 is/indicated by/its name]. 

[*SlD14]This is In the preferred embodiment of the present invention, an 

[*Y8D15]a link keyword embedded command for a default voice attribute is inserted 

[*Y5D16]too after the change of the voice attribute has been completed. 

[*S1P7D17]HTTP However, the present invention can be implemented by 

[*S5D18]is as is indicated by its name 25 inserting an embedded command only into a word for which 

[*SlD19]an HTML the voice attribute should be changed by inserting a special 

[*S5D20]transfer protocol word or symbol indicating a change in the voice attribute has 

[*SlD21]that WWW been completed, into the location where such a change is 

[*S5D22]uses. effected, and by the parameter generator 163 detecting the 

[*P2D23] 30 special word to automatically generate a parameter for a 

The mode in which a set of the symbols indicating the types default voice attribute. In this case, the parameter generator 

of the voice attributes and their voice attribute values is 163 generates a voice parameter to synthesize by using 

embedded into the voice command is merely an example. currently specified parameters, "speed," "pitch," "volume," 

The symbols and the voice attribute values may be so "intonation" and "gender," in accordance with the reading 

embedded so long as the voice synthesis controller 151 of 35 that is determined by the reading provider 159 and the accent 

the voice synthesis unit 150 can determine that the command determined by the accent provider 161. When a voice 

is a voice command, and can ascertain the type of a voice command indicating a voice attribute is embedded in front 

attribute embedded in the voice command, its attribute °f a word f° r synthesizing, that voice attribute is adopted for 

value, and the location in a sentence where the voice me "currently designated parameters." When such a voice 

attribute is to be changed. The locations of the voice attribute 40 command does not exist, a default voice attribute value that 

values may be fixed, such that the first byte in a voice ^ ^ for me system in advance is used for the "currently 

command indicates "gender** and the second byte indicates designated parameters." 

"speed," and the voice synthesis controller 151 may deter- The voice synthesizer 165 generates a voice signal in 

mine the types of voice attributes in accordance with their accordance with the voice parameter produced by the 

locations. 45 parameter generator 163. In the preferred embodiment of the 

It is preferable that an embedded command be placed at present invention, this generation is conducted by the D/A 

the head of a word that renders a voice attribute included in (digital/analog) conversion at the audio controller 21 in FIG. 

the command valid. However, so long as the position of the 1- The voice generator 167 produces a voice that relates to 

word that renders the voice attribute valid can be obtained the voice signal generated by the voice synthesizer 165. In 

from the sentence, the embedded command does not have to 50 this embodiment, this is implemented by the amplifier 22 

be placed at the head of the word. In this case, embedded in and the loudspeaker 23 in FIG. 1. Since the voice synthe- 

a voice command is the position in the sentence of a word sization is performed in response to the type of a special 

that renders valid the voice attribute embedded in the voice word that is included in the text, a user can identify the type 

command, and the voice synthesis controller 151 can render of the special word nearly by listening and without using 

the voice attribute in the voice command valid when the 55 vision, so that he or she can understand the contents of the 

synthesizing at the position in the sentence of the word that text. 

renders the voice attribute embedded in the voice command Although, in this embodiment, the type of a special word 

valid. is expressed by the alternation of the voice attribute, it is 

In the preferred embodiment of the present invention possible for a visually impaired user to recognize the type of 

producing a sentence that includes an embedded command 60 a word without relying on a change in the voice attribute, 

is a two step procedures. However, position information can Table 4 shows one example of a sentence in which is 

be embedded in a command at the step for producing a embedded the embedded command of the present invention, 

tentative file and this file can be used as a final file, or an In this example, the word "link keyword" is inserted imme- 

HTML file can be converted at a conversion step into a final diately before an actual link keyword to enable a visually 

file in which an embedded command is included. Further, a 65 impaired user to identify the position of the fink keyword, 

sentence in which a word list and an embedded command Voice attribute command "FM1" in this table is a command 

are included can be produced in the same procedure. to instruct to change to a female voice when the oral reading 
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is currently being performed using a male voice, and to 
instruct to change to a male voice when the reading is 
performed using a female voice. With this command, it is 
possible to identify the word whether it is a word "link 
keyword" that originally existed in the HTML file, or 
whether it is a word "link keyword" that has been inserted. 
Table 4] 

*DljThis is a title 
*D2]the picture of Yamato 
*D3]H2 
*D4]Italics 
*D5]This is 
*D6]H4 
*D7]Italics 
*D8]This is 

*D9]H6 15 

*D10]Italics 

*Dll]This is 

*SlFMlD12]link keyword 
*D12]This is a link keyword 

*S5FMlD13]This is not a link keyword 20 

"*SlFMlD14]link keyword 

*D14]This is 

*D15]a link keyword 

*D16]too 

*SlFMlD17]link keyword 25 
*D17]HTTP 

*S5FMlD18]is as is indicated by its name 
*SlFMlD19]link keyword 
*19]an HTML 

*S5FMlD20]transfer protocol 30 
*SlFMlD21]link keyword 
*D21]that WWW 
*S5FMlD22]uses. 

Special Word Focusing Synchronization 3S 

In the preferred embodiment of the present invention, the 
focus controller 129 in the Web browser 120 knows the 
location at which the reading is currently performed. More 
specifically, the parameter generator 163 manages a voice 
information file 171 for which the reading and the accent are 40 
provided and that is divided into morphemes, and the above 
described position information relates to each morpheme. 
The parameter generator 163 transmits voice files to the 
voice synthesizer 165 as morphemic units in accordance 
with the operation of a first-in and first-out system. The 45 
parameter generator 163 stores, as a voice synthesis pointer 
169, the position information related to the morphemic units 
of the voice files. The focus controller 129 of the Web 
browser 120 can obtain the information for the voice syn- 
thesis pointer 169 via the voice synthesis controller 151 in so 
the voice synthesis unit 150. 

The process sequence will be described by employing the 
previously described embedded command file. For a word 
block of "[*D17]an HTTP[*S5D18] is as is indicated by its 
name," the following voice information, for which the 55 
reading and the accent are provided by the reading provider 
159 and the accent provider 161, is transmitted to the 
parameter generator 163. 

"[*D17]an HTTP is, [*S5D18] as is, indicated by, its name," 
(the accents symbols are omitted here.) 60 

The parameter generator 163 converts this information 
into a web form file in which a parameter is set that is in 
accordance with the voice attribute of the embedded 
command, and stores it in the voice file storage section 171. 
The stored voice file can be expressed as follows. 65 
"[*D17]an HTTP is, [*D18]as is, [*D18] indicated by, 
[*D18]its name/' 



To read "as is," for example, the parameter generator 163 
transmits this voice file to the voice synthesizer 165, and sets 
the value for "D" (e.g., 18) to the voice synthesis pointer 
169. The position information is not necessarily held by the 
units of morphemes, and the morphemes "[*D17]an HTTP 
[*D18]is, as is, indicated by, its name" can be separately 
embedded. 

The focus controller 129 receives the information for the 
voice synthesis pointer 169 via the voice synthesis controller 
151. Based on this information, the focus controller 129 
refers to the word list, searches for an entry of 18 indicated 
by the pointer 301, and is aware that the 100th word "is as 
indicated by its name" is being read. 

The focus controller 129 transmits the position informa- 
tion for the word list to the display section, which then 
displays the location where the reading is currently per- 
formed in such a way, such as by employing highlighting in 
a display, that a user can easily identify it. The highlighting 
in the display is performed in synchronization with the voice 
synthesization, and controls, such as an insertion of a delay 
time, can be performed. 

Link Function 

According to the present invention, a link keyword is 
specified that is related to the location where the reading is 
performed. When the keyword is selected, the process jumps 
to a link destination that relates to the keyword. More 
specifically, the focus controller 129 compares the voice 
synthesis pointer 169 with the pointer information and the 
link head flag information, in the word list 131, to specify 
the link destination 311 information relates to the location 
where the reading is being performed. When, for example, 
the voice synthesis pointer indicating the location where the 
reading is performed is "16," the focus controller 129 selects 
a pointer that has a maximum value that is less than 16 and 
that has a link head flag set to "1." 

In this case, "14" is selected. The focus controller 129 
stores the pointer information in the focusing pointer 139, 
Since the pointer information is used to specify the link 
destination, link destination information, such as 
"THAT. HTM," can be stored directly. In this embodiment, 
the focus controller 129 selects a pointer that has a maxi- 
mum value less than the value of the voice synthesis pointer 
and having a link head flag set to "1," and stores the selected 
pointer as a focus pointer. When such a pointer does not 
exist, a pointer that has the smallest value and that has a Link 
head flag set to "1" is stored as a focus pointer. 

When key input instructing a "jump to a link destination" 
has been detected during the reading, link destination infor- 
mation that relates to the focus pointer can be specified. The 
link method can be used for conventional hyper text to jump 
to the link destination for its keyword. Therefore, while 
listening to a voice, a visually impaired user need only 
perform a simple manipulation to jump to a link destination 
and to continue the reading of the hyper text. In the preferred 
embodiment of the present invention, in response to the 
detection of the above user's input, the contents of a buffer 
stored in various files, such as the embedded command file 
141 and the voice file 171, are cleared, and various types of 
information, such as the focus pointer 139, the voice syn- 
thesis pointer 169 and the voice jump pointer 173, are 
initially set 

Other Functions 

In the preferred embodiment of the present invention, the 
movement and selection of the link keyword can be freely 
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performed by allocating keys having the following functions 
on the keyboard. 

key 1: reading of link keyword focused on 

key 2: reading beginning at link keyword focused on 

key 3: jump to link destination s 

key 4: forward movement of link keyword 

key 5: backward movement of link keyword 

key 6: play 

key 7: stop 

key 8: pause 10 
key 9: fast forward 
key 10: fast rewind 

key 11: reading only of link keyword focused on 

Keys 6 though 10 can be provided by using the conven- 
tional method, and key 3, jump to the link destination, has 15 
been previously described. Thus, only keys 1 through 5 and 
11 will now be explained. 

When key 1 is depressed, the user input section 127 
detects this event, and transmits this information to the focus 
controller 129. Upon receipt of this information, the focus 20 
controller 129 obtains pointer information stored in the 
focus pointer 139. In addition, the focus controller 129 refers 
to the word list 131 to specify a word to be orally read. 
When, for example, the contents of the focus pointer is "14," 
it is ascertained, by referring to the link flag, that succeeding 25 
words "14" through "16" that have link flags that are set to 
"1" and link head flags that are not set to "1" are those to be 
read. 

The focus controller 129 instructs the voice synthesis 
controller 151 to synthesize for words whose position infor- 30 
mation is 14 through 16. The voice synthesis controller 151 
temporarily stores, as the voice jump pointer 173, the start 
position and the end position of the position information for 
which voice synthesis should be performed. Since words 
that relate to one link destination continuously exist, the start 35 
position and the number of words may be stored instead of 
the start position and the end position. Further, since the 
number of words and the end position are data that are 
available by referring to the link flag and the link head flag 
in the word list, only the start position may be stored. 40 

The voice synthesis controller 151 examines the contents 
of the voice synthesis file storage section 171 to determine 
whether or not a voice file having the position information 
is present. When such a voice file remains, a corresponding 
voice file is extracted (voice files except for the correspond- 45 
ing voice file may be erased from the voice file storage 
section 171), and only voice files whose position informa- 
tion is 14 to 16 are transmitted to the synthesizer 165. When 
no corresponding voice file is found, the voice file storage 
section 171 is cleared, a corresponding embedded command 50 
is extracted from the embedded command file, and voice 
synthesization is performed to read a specified link keyword. 

When key 2 is depressed, almost the same procedures are 
performed as for the key 1. When key 2 is depressed, the 
user input section 127 detects this event, and transmits it to 55 
the focus controller 129. In response to this, the focus 
controller 129 obtains pointer information stored in the 
focus pointer 139. Further, the focus controller 129 refers to 
the word list 131 to specify a word to be orally read. When, 
for example, the contents of the focus pointer is "14," it is 60 
ascertained that words whose position information is "14" to 
"9999" ("9999" indicates the last sentence in this 
embodiment) are those to be read. 

The focus controller 129 instructs the voice synthesis 
controller 151 to synthesize for words whose position infor- 65 
mation is 14 through 9999. The voice synthesis controller 
151 temporarily stores, as the voice jump pointer 173, the 



start position and the end position for the position informa- 
tion for which voice synthesis should be performed. The 
voice synthesis controller 151 examines the contents of the 
voice synthesis file storage section 171 to determine whether 
or not a voice file having the position information "14" is 
present. When such a voice file is found, voice files that do 
not correspond are ignored (or abandoned from the voice file 
storage section 171), and only voice files whose position 
information is 14 or greater are transmitted to the synthe- 
sizer 165. When no corresponding voice file exists, the voice 
file storage section 171 is temporarily cleared, and voice 
synthesization is performed for embedded commands, of an 
embedded command file, for which the position information 
is 14 or greater, so that reading beginning at a specified link 
keyword is performed. 

When key 4 is depressed, almost the same procedures are 
performed as for key 2. When key 4 is depressed, the user 
input section 127 detects this event, and transmits it to the 
focus controller 129. In response to this, the focus controller 
129 obtains pointer information stored in the focus pointer 
139. Further, the focus controller 129 refers to the word list 
131 to search for a word having a link head flag that is set 
to 1 and maximum position information that is smaller than 
a focus pointer. When, for example, the contents of the focus 
pointer is "14," it is ascertained that a word whose position 
information is "12" is the one to be read. 

The focus controller 129 instructs the voice synthesis 
controller 151 to synthesize for words whose position infor- 
mation is 12 through 9999. The voice synthesis controller 
151 temporarily stores, as the voice jump pointer 173, the 
start position and the end position for the position informa- 
tion for which voice synthesis should be performed. The 
voice synthesis controller 151 examines the contents of the 
voice synthesis file storage section 171 to determine whether 
or not a voice file having the position information " 12" is 
present. When such a voice file is found, voice files that do 
not correspond are ignored (or erased from the voice file 
storage section 171), and only voice files whose position 
information is 12 or greater are transmitted to the synthe- 
sizer 165. When no corresponding voice file is found, the 
voice file storage section 171 is temporarily cleared, and 
voice synthesization is performed for embedded commands, 
of an embedded command file, for which the position 
information is 12 or greater, so that reading beginning at a 
specified link keyword is performed. 

When key 5 is depressed, the focus controller 129 
searches the word list 131 for a word that has a link head flag 
that is set to 1 and for which the position information is the 
smallest of those greater than the focus pointer. The remain- 
ing process is the same as for key 4. 

When key 11 is depressed, the user input section 127 
detects this event, and transmits it to the focus controller 
129. In response to this, the focus controller 129 refers to the 
word list 131 to specify a word to be read. In other words, 
all of the words that have a link flag set to 1 are extracted. 

In the preferred embodiment of the present invention, a 
word, which succeeds the word whose link head flag is 1, 
that has a link flag set to 1 and a link head flag not set to 1, 
is determined to be a link keyword that carries a meaning. 
A word or a command to instruct a reading interval, such as 
a constant blank period, is inserted between meaningful link 
keywords, so that a silent period during which oral reading 
is not performed for a constant time interval is formed 
between the meaningful link keywords. 

The focus controller 129 produces a new embedded 
command file, and instructs the voice synthesis controller 
151 to clear a currently stored voice file and to perform voice 



04/01/2004, EAST Version: 1.4.1 



5,983,184 



19 



20 



synthesization for the new embedded command file. Instead 
of producing a new embedded command, the above 
described process can be performed when the voice synthe- 
sizer extracts a word, for which voice synthesization should 
be performed, from an embedded command file that is 5 
currently stored in the voice synthesis unit 150, and synthe- 
size for that word. 

The above described keyword control requiring key input 
can be replaced with a link keyword control employing 
voice in put that uses a conventional voice recognition 10 
method. In this case, in addition to the user input section 127 
in FIG. 2, a voice recognizer is provided that receives oral 
input by a user, such as a "link keyword" and "jump," that 
takes the place of the key input, identifies the input, and 
transmits to the focus controller 129 an instruction that 15 
corresponds to the identified input type. When oral input by 
the user is employed, it is preferable that a time be set for 
accepting user input and that voice synthesis be halted 
during that time. However, so long as an environment is 
provided wherein voice output does not affect the oral input, 20 
such as when a user uses a headphone and a microphone, the 
oral input and the voice output can be performed at the same 
time. 

As is described above, according to the present invention, 
while a visually impaired user is listening to a voice express- 25 
ing the contents of a sentence, the user can understand the 
contents by identifying the differences between voice 
attributes, and can perform an adequate operation for hyper 
text. 

The present invention can be employed when the contents 30 
of a data file having a text attribute or hyper text data are to 
be understood by employing a voice synthesis program. A 
visually impaired user, or a user whose situation is such that 
he can not look at a screen, can listen to a voice reciting the 
contents of a sentence, and understand them by identifying 35 
the differences between voice attributes. As most of the 
WWW data on the Internet are provided as sentence data, 
when this data is orally read, a very large amount of WWW 
data can be obtained by vocal recitation. 
I claim as my invention: 40 
1. Apparatus for use with an HTML file that includes a 
plurality of link keywords, wherein position information and 
destination information are associated with each of the link 
keywords, and wherein each the link keywords is bounded 
by a start tag and an end tag, the apparatus comprising: 45 
a communication controller for receiving an HTML file; 
means for producing a word list that includes link key- 
words from the HTML file, and corresponding position 
information for each of the link keywords; 
means for converting the start tag and the end tag of each 
of the link keywords into voice attribute information, 
and for correlating the voice attribute information with 
the position information of the corresponding fink 
keyword to produce a voice attribute embedded com- 5S 
mand file; 

a voice synthesizer for converting, using the voice 
attribute information in the voice attribute embedded 
command file, the link keywords into speech; 

means for obtaining voice synthesis pointer information 60 
that corresponds to the link keyword currently being 
synthesized; 



50 



means for determining the destination information corre- 
sponding to the voice synthesis pointer information; 

means for detecting a user input; and 

means for accessing, in response to the user input, the link 
destination currendy selected by the means for deter- 
mining the destination information. 

2. Apparatus for use with hyper text that includes a 
plurality of link keywords, wherein each of the link key- 
words includes corresponding link destination information, 
the apparatus comprising: 

means for producing a word list that includes link key- 
words and corresponding position information for the 
fink keywords in the hyper text; 

means for producing voice synthesis input information 
that includes voice attribute information for the link 
keywords in the hyper text; 

synthesizing means for synthesizing, using the voice 
attribute information, the link keywords into speech; 

means for obtaining voice synthesis pointer information 
corresponding to the link keyword currently being 
synthesized; 

means for determining, using the voice synthesis pointer 
information, the destination information that corre- 
sponds to the link keyword currendy being synthe- 
sized; 

means for detecting a user input instruction to jump to the 
link destination of the link keyword currently being 
systhesized; and 

means for accessing, in response to the user input, the link 
destination currendy selected by the means for deter- 
mining. 

3. Apparatus for synthesizing a sentence including a 
plurality of special words, comprising: 

means for producing a word list managing information for 
specifying the plurality of special words and position 
information for specifying a position of the plurality of 
special words in the sentence; 

means for producing voice synthesis input information by 
relating a voice attribute with each of the special words, 
and by converting the sentence; 

means for synthesizing the voice synthesis input infor- 
mation; 

means for obtaining a voice synthesis pointer information 
related to a position in the sentence related to a position 
currently synthesized; 

means for determining a special word related to a position 
currently synthesized by searching a position informa- 
tion in the sentence related to the voice synthesis 
pointer information in the word list; 

means for detecting user input instructing to change the 
voice synthesis position; 

means for obtaining, in response to the user input, from 
the word list a position information for special word 
adjacent to a specific word related to a position cur- 
rently synthesized; and 

means for synthesizing at a position related to the position 
information adjacent to special word. 
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